279 research outputs found

    Exploiting Gene-Environment Independence for Analysis of Case-Control Studies: An Empirical Bayes Approach to Trade Off between Bias and Efficiency

    Get PDF
    Standard prospective logistic regression analysis of case-control data often leads to very imprecise estimates of gene-environment interactions due to small numbers of cases or controls in cells of crossing genotype and exposure. In contrast, modern ``retrospective\u27\u27 methods, including the celebrated ``case-only\u27\u27 approach, can estimate the interaction parameters much more precisely, but they can be seriously biased when the underlying assumption of gene-environment independence is violated. In this article, we propose a novel approach to analyze case-control data that can relax the gene-environment independence assumption using an empirical Bayes (EB) framework. In the special case, involving a binary gene and a binary exposure, the framework leads to an estimator of the odds-ratio interaction parameter in a simple closed form that corresponds to an weighted average of the standard case-only and case-control estimators. We also describe a general approach for deriving the EB estimator and its variances within the retrospective maximum-likelihood framework developed by Chatterjee and Carroll (2005). We conduct simulation studies to investigate the mean-squared-error of the proposed estimator in both fixed and random parameter settings. We also illustrate the application of this methodology using two real data examples. Both simulated and real data examples suggest that the proposed estimator strikes an excellent balance between bias and efficiency depending on the true nature of the gene-environment association and the sample size for a given study

    Exploiting Gene-Environment Independence for Analysis of Case-Control Studies: An Empirical Bayes Approach to Trade Off between Bias and Efficiency

    Get PDF
    Standard prospective logistic regression analysis of case-control data often leads to very imprecise estimates of gene-environment interactions due to small numbers of cases or controls in cells of crossing genotype and exposure. In contrast, under the assumption of gene-environment independence, modern ā€œretrospectiveā€ methods, including the ā€œcase-onlyā€ approach, can estimate the interaction parameters much more precisely, but they can be seriously biased when the underlying assumption of gene-environment independence is violated. In this article, we propose a novel approach to analyze case-control data that can relax the gene-environment independence assumption using an empirical Bayes framework. In the special case, involving a binary gene and a binary exposure, the framework leads to an estimator of the odds-ratio interaction parameter in a simple closed form that corresponds to an weighted average of the standard case-only and case-control estimators. We also describe a general approach for deriving the empirical Bayes estimator and its variance within the retrospective maximum-likelihood framework developed by Chatterjee and Carroll (2005). We conduct simulation studies to investigate the mean-squared-error of the proposed estimator in both fixed and random parameter settings. We also illustrate the application of this methodology using two real data examples. Both simulated and real data examples suggest that the proposed estimator strikes an excellent balance between bias and efficiency depending on the true nature of the gene-environment association and the sample size for a given study

    A note on bias due to fitting prospective multivariate generalized linear models to categorical outcomes ignoring retrospective sampling schemes

    Get PDF
    Outcome dependent sampling designs are commonly used in economics, market research and epidemiological studies. Case-control sampling design is a classic example of outcome dependent sampling, where exposure information is collected on subjects conditional on their disease status. In many situations, the outcome under consideration may have multiple categories instead of a simple dichotomization. For example, in a case-control study, there may be disease sub-classification among the ā€œcasesā€ based on progression of the disease, or in terms of other histological and morphological characteristics of the disease. In this note, we investigate the issue of fitting prospective multivariate generalized linear models to such multiple-category outcome data, ignoring the retrospective nature of the sampling design. We first provide a set of necessary and sufficient conditions for the link functions that will allow for equivalence of prospective and retrospective inference for the parameters of interest. We show that for categorical outcomes, prospective-retrospective equivalence does not hold beyond the generalized multinomial logit link. We then derive an approximate expression for the bias incurred when link functions outside this class are used. We illustrate the extent of bias through a real data example, based on the ongoing Prostate, Lung, Colorectal and Ovarian (PLCO) cancer screening trial by the National Cancer Institute

    A note on bias due to fitting prospective multivariate generalized linear models to categorical outcomes ignoring retrospective sampling schemes

    Get PDF
    AbstractOutcome-dependent sampling designs are commonly used in economics, market research and epidemiological studies. Case-control sampling design is a classic example of outcome-dependent sampling, where exposure information is collected on subjects conditional on their disease status. In many situations, the outcome under consideration may have multiple categories instead of a simple dichotomization. For example, in a case-control study, there may be disease sub-classification among the ā€œcasesā€ based on progression of the disease, or in terms of other histological and morphological characteristics of the disease. In this note, we investigate the issue of fitting prospective multivariate generalized linear models to such multiple-category outcome data, ignoring the retrospective nature of the sampling design. We first provide a set of necessary and sufficient conditions for the link functions that will allow for equivalence of prospective and retrospective inference for the parameters of interest. We show that for categorical outcomes, prospectiveā€“retrospective equivalence does not hold beyond the generalized multinomial logit link. We then derive an approximate expression for the bias incurred when link functions outside this class are used. Most popular models for ordinal response fall outside the multiplicative intercept class and one should be cautious while performing a naive prospective analysis of such data as the bias could be substantial. We illustrate the extent of bias through a real data example, based on the ongoing Prostate, Lung, Colorectal and Ovarian (PLCO) cancer screening trial by the National Cancer Institute. The simulations based on the real study illustrate that the bias approximations work well in practice

    Exploiting Gene-Environment Independence for Analysis of Caseā€“Control Studies: An Empirical Bayes-Type Shrinkage Estimator to Trade-Off between Bias and Efficiency

    Full text link
    Standard prospective logistic regression analysis of caseā€“control data often leads to very imprecise estimates of gene-environment interactions due to small numbers of cases or controls in cells of crossing genotype and exposure. In contrast, under the assumption of gene-environment independence, modern ā€œretrospectiveā€ methods, including the ā€œcase-onlyā€ approach, can estimate the interaction parameters much more precisely, but they can be seriously biased when the underlying assumption of gene-environment independence is violated. In this article, we propose a novel empirical Bayes-type shrinkage estimator to analyze caseā€“control data that can relax the gene-environment independence assumption in a data-adaptive fashion. In the special case, involving a binary gene and a binary exposure, the method leads to an estimator of the interaction log odds ratio parameter in a simple closed form that corresponds to an weighted average of the standard case-only and caseā€“control estimators. We also describe a general approach for deriving the new shrinkage estimator and its variance within the retrospective maximum-likelihood framework developed by Chatterjee and Carroll (2005, Biometrika 92, 399ā€“418). Both simulated and real data examples suggest that the proposed estimator strikes a balance between bias and efficiency depending on the true nature of the gene-environment association and the sample size for a given study.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/65511/1/j.1541-0420.2007.00953.x.pd

    Bayesian semiparametric analysis for two-phase studies of gene-environment interaction

    Full text link
    The two-phase sampling design is a cost-efficient way of collecting expensive covariate information on a judiciously selected subsample. It is natural to apply such a strategy for collecting genetic data in a subsample enriched for exposure to environmental factors for gene-environment interaction (G x E) analysis. In this paper, we consider two-phase studies of G x E interaction where phase I data are available on exposure, covariates and disease status. Stratified sampling is done to prioritize individuals for genotyping at phase II conditional on disease and exposure. We consider a Bayesian analysis based on the joint retrospective likelihood of phases I and II data. We address several important statistical issues: (i) we consider a model with multiple genes, environmental factors and their pairwise interactions. We employ a Bayesian variable selection algorithm to reduce the dimensionality of this potentially high-dimensional model; (ii) we use the assumption of gene-gene and gene-environment independence to trade off between bias and efficiency for estimating the interaction parameters through use of hierarchical priors reflecting this assumption; (iii) we posit a flexible model for the joint distribution of the phase I categorical variables using the nonparametric Bayes construction of Dunson and Xing [J. Amer. Statist. Assoc. 104 (2009) 1042-1051].Comment: Published in at http://dx.doi.org/10.1214/12-AOAS599 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Design Issues for Generalized Linear Models: A Review

    Full text link
    Generalized linear models (GLMs) have been used quite effectively in the modeling of a mean response under nonstandard conditions, where discrete as well as continuous data distributions can be accommodated. The choice of design for a GLM is a very important task in the development and building of an adequate model. However, one major problem that handicaps the construction of a GLM design is its dependence on the unknown parameters of the fitted model. Several approaches have been proposed in the past 25 years to solve this problem. These approaches, however, have provided only partial solutions that apply in only some special cases, and the problem, in general, remains largely unresolved. The purpose of this article is to focus attention on the aforementioned dependence problem. We provide a survey of various existing techniques dealing with the dependence problem. This survey includes discussions concerning locally optimal designs, sequential designs, Bayesian designs and the quantile dispersion graph approach for comparing designs for GLMs.Comment: Published at http://dx.doi.org/10.1214/088342306000000105 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Semiparametric Bayesian Analysis of Caseā€“Control Data under Conditional Gene-Environment Independence

    Full text link
    In caseā€“control studies of gene-environment association with disease, when genetic and environmental exposures can be assumed to be independent in the underlying population, one may exploit the independence in order to derive more efficient estimation techniques than the traditional logistic regression analysis ( Chatterjee and Carroll, 2005 , Biometrika 92, 399ā€“418). However, covariates that stratify the population, such as age, ethnicity and alike, could potentially lead to nonindependence. In this article, we provide a novel semiparametric Bayesian approach to model stratification effects under the assumption of gene-environment independence in the control population. We illustrate the methods by applying them to data from a population-based caseā€“control study on ovarian cancer conducted in Israel. A simulation study is conducted to compare our method with other popular choices. The results reflect that the semiparametric Bayesian model allows incorporation of key scientific evidence in the form of a prior and offers a flexible, robust alternative when standard parametric model assumptions do not hold.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/65893/1/j.1541-0420.2007.00750.x.pd

    Bayesian Analysis of Timeā€Series Data under Caseā€Crossover Designs: Posterior Equivalence and Inference

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/102132/1/biom12102.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/102132/2/biom12102-sm-0001-SupInfo-S1.pd
    • ā€¦
    corecore